Formatting: Reject leading, trailing, and consecutive dots in the email local part#12188
Formatting: Reject leading, trailing, and consecutive dots in the email local part#12188nimesh-xecurify wants to merge 1 commit into
Conversation
Enforces dot-atom syntax for the local part of email addresses in `WP_Email_Address`. This ensures that leading, trailing, and consecutive dots are rejected, aligning validation with RFC 5321/5322 and PHP's `FILTER_VALIDATE_EMAIL` because such addresses are generally undeliverable.
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Hi there! 👋 Thank you for your contribution to WordPress! 💖 It looks like this is your first pull request to No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description. Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making. More information about how GitHub pull requests can be used to contribute to WordPress can be found in the Core Handbook. Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook. If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook. The Developer Hub also documents the various coding standards that are followed:
Thank you, |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
Description
is_email( 'abc..def@xyz.com' )returns the address as valid, even though the local part contains consecutive dots. PHP'sfilter_var( …, FILTER_VALIDATE_EMAIL )rejects it, and so does every SMTP server, because the address is undeliverable. The same applies to a leading dot (.abc@xyz.com) and a trailing dot (abc.@xyz.com).Since [62225]/#31992,
is_email()delegates toWP_Email_Address::from_string(), whose local-part patterns are derived from the WHATWG<input type=email>character set. That set places.in the character class with no positional constraint ([…]+), so any arrangement of dots passes. The domain side already validates correctly (each label must be bookended by an alphanumeric), so this only affected the local part.This PR restructures the local part as a proper dot-atom, mirroring the existing domain-label design already in the class: a dot may only separate non-empty atoms. Leading, trailing, and consecutive dots are now rejected, while all previously valid addresses — including the intended Unicode addresses from #31992 (
grå@grå.org) — continue to validate.Why reject these
This change is not a departure from #31992 — it implements a decision that ticket's discussion already reached. While reviewing test cases on #31992, @agulbra noted:
and @peteresnick — the author of RFC 5322 — confirmed:
The WHATWG character set was adopted as the basis for the allowed characters, but the agreed intent was to be at least as strict as the syntaxes used to generate and deliver mail, which forbid empty atoms in the local part. The shipped regex used the WHATWG
[…]+shorthand verbatim and did not enforce that structure; this PR closes that gap.Implementation
LOCAL_PART_ATOM_ASCIIandLOCAL_PART_ATOM_UNICODEconstants (the previous local-part character sets, minus the dot).LOCAL_PART_ASCII_REGEX/LOCAL_PART_UNICODE_REGEXasatom (?:\. atom)*, the same dot-separated structure already used byDOMAIN_ASCII_REGEX/DOMAIN_UNICODE_REGEX.Testing instructions
false:npm run test:php -- --group 55821 npm run test:php -- --filter 'IsEmail|EmailAddress|Antispambot|SanitizeEmail'Unit tests
..@example.comfrom the valid provider to the invalid provider, added the leading/trailing/consecutive-dot cases, and tagged the test with@ticket 55821.@ticket 55821test asserting rejection in both unicode and ascii modes.antispambot()is obfuscation-only; behavior unchanged).Trac ticket: https://core.trac.wordpress.org/ticket/55821
Use of AI Tools
AI assistance: Yes — used to investigate the regression, cross-reference the #31992 discussion, draft the fix, and write the tests. All changes were reviewed and verified by me.